Add similarity search functions for tasks and tickets#166
Add similarity search functions for tasks and tickets#166dolliecoder wants to merge 1 commit intoAOSSIE-Org:mainfrom
Conversation
📝 WalkthroughWalkthroughThis pull request adds two PostgreSQL PL/pgSQL functions for semantic similarity search. The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql`:
- Around line 17-20: Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).
🧹 Nitpick comments (1)
supabase/migrations/20251021110000_task_ticket_vector_search.sql (1)
1-23: ConsiderLANGUAGE sqland marking asSTABLE.Since the function body is a single
RETURN QUERY SELECT, PL/pgSQL is unnecessary overhead — plainLANGUAGE sqlavoids the PL/pgSQL executor layer. Additionally, these functions have no side effects and return consistent results for the same inputs within a transaction, so marking themSTABLElets the planner optimize repeated calls.♻️ Suggested diff
CREATE OR REPLACE FUNCTION get_similar_tasks( query_embedding vector(768), match_count INT DEFAULT 3 ) RETURNS TABLE ( task_id UUID, title TEXT, description TEXT, similarity FLOAT -) AS $$ -BEGIN - RETURN QUERY +) LANGUAGE sql STABLE AS $$ SELECT t.id AS task_id, t.title, t.description, 1 - (t.description_embedding <=> query_embedding) AS similarity FROM tasks t WHERE t.description_embedding IS NOT NULL ORDER BY t.description_embedding <=> query_embedding LIMIT match_count; -END; -$$ LANGUAGE plpgsql; +$$;Same applies to
get_similar_ticketsbelow.
| 1 - (t.description_embedding <=> query_embedding) AS similarity | ||
| FROM tasks t | ||
| WHERE t.description_embedding IS NOT NULL | ||
| ORDER BY t.description_embedding <=> query_embedding |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the migration file
find . -name "*20251021110000_task_ticket_vector_search.sql" -type fRepository: AOSSIE-Org/Ell-ena
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Check all SQL files for vector index creation patterns
rg -n 'hnsw|ivfflat|vector_cosine_ops|vector_l2_ops' --type sqlRepository: AOSSIE-Org/Ell-ena
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Also search for CREATE INDEX with vector patterns
rg -n 'CREATE INDEX.*embedding|CREATE INDEX.*vector' --type sql -iRepository: AOSSIE-Org/Ell-ena
Length of output: 44
🏁 Script executed:
#!/bin/bash
# List all migration files to understand the structure
find . -path "*/migrations/*.sql" -type f | sortRepository: AOSSIE-Org/Ell-ena
Length of output: 743
🏁 Script executed:
#!/bin/bash
# Read the migration file to see its full content
cat -n ./supabase/migrations/20251021110000_task_ticket_vector_search.sqlRepository: AOSSIE-Org/Ell-ena
Length of output: 1518
🏁 Script executed:
#!/bin/bash
# Also check the meeting vector search migration for comparison
cat -n ./supabase/migrations/20251021090000_meeting_vector_search.sqlRepository: AOSSIE-Org/Ell-ena
Length of output: 4913
Add vector indexes for efficient similarity queries.
The <=> operator in both ORDER BY clauses will fall back to sequential scans without indexes. Add HNSW indexes to enable efficient vector search:
CREATE INDEX IF NOT EXISTS idx_tasks_embedding
ON tasks USING hnsw (description_embedding vector_cosine_ops);
CREATE INDEX IF NOT EXISTS idx_tickets_embedding
ON tickets USING hnsw (description_embedding vector_cosine_ops);This is critical once the tables grow beyond a few thousand rows. If PR1 creates these indexes separately, this reminder can be disregarded.
Also applies to: 42-45
🤖 Prompt for AI Agents
In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql` around
lines 17 - 20, Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).
There was a problem hiding this comment.
I left indexing for now on purpose coz I wanted to keep this one focused on adding the similarity functions for tasks and tickets and not expanding the scope too much but I can add another pr for indexing as the table grows
There was a problem hiding this comment.
@dolliecoder, that makes sense! Keeping the PR focused is a good approach.
Would you like me to create a follow-up issue to track adding the HNSW indexes for tasks.description_embedding and tickets.description_embedding? This way it won't be forgotten as the tables grow.
|
Just checking in on this PR I will be happy to revise based on reviews. Please let me know if there’s anything I can adjust to help in moving this forward. |
This PR introduces SQL helper functions to enable vector similarity search for tasks and tickets using the pgvector setup added in PR1.
It is a follow-up, incremental step toward Issue #65. While PR1 introduced the description_embedding vector(768) columns for tasks and tickets (storage layer), this PR builds on that foundation by adding database-level similarity search functions (retrieval layer).
No embedding generation, indexing, or AI service integration is included here. This PR strictly enables semantic retrieval capability at the database level.
Dependency Note:
This PR depends on PR1, as it relies on the description_embedding columns introduced there. PR1 must be merged before this PR to ensure the functions execute against an existing schema. pr1 : #160
Changes Made
Added get_similar_tasks(query_embedding, match_count) SQL function
Added get_similar_tickets(query_embedding, match_count) SQL function
Each function:
Computes cosine similarity using <=>
Returns top-k most semantically similar rows
Ignores rows without embeddings (IS NOT NULL)
Added new Supabase migration file to maintain proper migration ordering
✅ Checklist
I have read the contributing guidelines.
I have added tests that prove my fix is effective or that my feature works.
(Not applicable – database-level capability addition only.)
I have added necessary documentation (if applicable).
(Not required at this stage.)
Any dependent changes have been merged and published in downstream modules.
(Depends on PR1 – embedding schema changes.)